MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

نویسندگان

  • Shanjiang Tang
  • Bu-Sung Lee
  • Bingsheng He
چکیده

MapReduce has become a widely used computing model for largescale data processing in clusters and data centers. A MapReduce workload generally contains multiple jobs. Due to the general execution constraints that map tasks are executed before reduce tasks, different job execution orders in a MapReduce workload can have significantly different performance and system utilization. This paper proposes a prototype system called MROrder to dynamically optimize the job order for online MapReduce workloads. Moreover, MROrder is designed to be flexible for different optimization metrics, e.g., makespan and total completion time. The experimental results show that MROrder is able to improve the system performance by up to 31% for makespan and 176% for total completion time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs

MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In...

متن کامل

FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads

Originally, MapReduce implementations such as Hadoop employed First In First Out (fifo) scheduling, but such simple schemes cause job starvation. The Hadoop Fair Scheduler (hfs) is a slot-based MapReduce scheme designed to ensure a degree of fairness among the jobs, by guaranteeing each job at least some minimum number of allocated slots. Our prime contribution in this paper is a different, fle...

متن کامل

Shared Execution of Recurring Workloads in MapReduce

With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commo...

متن کامل

OctopusDB : flexible and scalable storage management for arbitrary database engines

We live in a dynamic age with the economy, the technology, and the people around us changing faster than ever before. Consequently, the data management needs in our modern world are much different than those envisioned by the early database inventors in the 70s. Today, enterprises face the challenge of managing ever-growing dataset sizes with dynamically changing query workloads. As a result, m...

متن کامل

Towards Understanding Cloud Performance Tradeoffs Using Statistical Workload Analysis and Replay

Cloud computing has given rise to a variety of distributed applications that rely on the ability to harness commodity resources for large scale computations. The inherent performance variability in these applications’ workload coupled with the system’s heterogeneity render ineffective heuristics-based design decisions such as system configuration, application partitioning and placement, and job...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013